Using corpora tools to analyze gradable nouns in Dutch

نویسندگان

  • Nick Ruiz
  • Edgar Weiffenbach
چکیده

In this paper, we expand Morzycki (2009)’s claims that degree readings of size adjectives are attributed to syntax. We introduce a corpus-based analysis in Dutch to verify and extend his claim into the semantic domain. Using the LASSY Treebank, we extract syntactic and semantic properties of noun phrases consisting of the adjectives “gigantisch”, “kolossaal”, and “reusachtig” and manually annotate each adjective-noun pair with a gradable or nongradable label. Using these features, we construct a statistical model based on logistic regression and find that the grammatical role, definiteness, and particular semantic noun groups derived from Cornetto (a Dutch WordNet with referential relations) have a significant effect on the likelihood that an adjective-noun pair is interpreted by the reader to have a degree reading.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deriving de/het gender classification for Dutch nouns for rule-based MT generation tasks

Linguistic resources available in the public domain, such as lemmatisers, part-ofspeech taggers and parsers can be used for the development of MT systems: as separate processing modules or as annotation tools for the training corpus. For SMT this annotation is used for training factored models, and for the rule-based systems linguistically annotated corpus is the basis for creating analysis, ge...

متن کامل

The Other Pole of Degree Modification of Gradable Nouns by Size Adjectives: A Mandarin Chinese Perspective

Size adjectives can have degree readings when they modify gradable nouns. However, a cross-linguistic variation exists with respect to what type(s) of size adjectives in a particular language can have such readings. In English degree readings are available only for size adjectives that predicate bigness, and in Mandarin Chinese degree readings are available for all size adjectives irrespective ...

متن کامل

Crosslingual Countability Classification: English meets Dutch

This paper presents a range of methods for classifying Dutch nouns as countable, uncountable or plural only based on both Dutch and English data. The classification is based on the occurrence of countability specific linguistic features that are extracted from unannotated corpora. We show that in the absence of reliable Dutch gold standard data, cross-linguistic classification can be achieved o...

متن کامل

Semantic Clustering in Dutch Automatically inducing semantic classes from large-scale corpora

Handcrafting semantic classes is a difficult and time-consuming job, and depends on human interpretation. Possible machine learning techniques would be much faster, and do not rely on interpretation, because they stick to the data. The goal of this research is to present some machine learning techniques that make it possible to achieve an automatic clustering of Dutch words. More particularly, ...

متن کامل

Semantics-based Multiword Expression Extraction

This paper describes a fully unsupervised and automated method for large-scale extraction of multiword expressions (MWEs) from large corpora. The method aims at capturing the non-compositionality of MWEs; the intuition is that a noun within a MWE cannot easily be replaced by a semantically similar noun. To implement this intuition, a noun clustering is automatically extracted (using distributio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011